On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DP-based learning algorithms to the powerful te...

متن کامل

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments These algorithms including the TD algo rithm of Sutton and the Q learning algorithm of Watkins can be motivated heuristically as approximations to dynamic program ming DP In this paper we provide a rigorous proof of convergence of these DP ba...

متن کامل

Soft Dynamic Programming Algorithms: Convergence Proofs

Algorithms based on dynamic programming (DP) nd optimal solutions to nite-state optimal control tasks by iterating a \backup" operator that only considers the consequences of executing the \best" action in a state. In many problem domains, the optimal solution may be \brittle" and it may be desirable to nd robust, if suboptimal, solutions that prefer states that have many \good" actions to choo...

متن کامل

Convergence of Sample Path Optimal Policies for Stochastic Dynamic Programming

We consider the solution of stochastic dynamic programs using sample path estimates. Applying the theory of large deviations, we derive probability error bounds associated with the convergence of the estimated optimal policy to the true optimal policy, for finite horizon problems. These bounds decay at an exponential rate, in contrast with the usual canonical (inverse) square root rate associat...

متن کامل

Convergence of Numerical Method for Multistate Stochastic Dynamic Programming

Convergence of corrections is examined for a predictorcorrector method to solve Bellman equations of multi-state stochastic optimal control in continuous time. Quadratic costs and constrained control are assumed. A heuristically linearized comparison equation makes the nonlinear, discontinuous Bellman equation amenable to linear convergence analysis. Convergence is studied using the Fourier sta...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neural Computation

سال: 1994

ISSN: 0899-7667,1530-888X

DOI: 10.1162/neco.1994.6.6.1185